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Abstract — Web applications written in JavaScript are regu- 
larly used for dealing with sensitive or personal data. Conse- 
quently, reasoning about their security properties has become 
an important problem, which is made very difficult by the 
highly dynamic nature of the language, particularly its support 
for runtime code generation. As a first step towards dealing 
with this, we propose to investigate security analyses for lan- 
guages with more principled forms of dynamic code generation. 
To this end, we present a static information flow analysis 
for a dynamically typed functional language with prototype- 
based inheritance and staged metaprogramming. We prove 
its soundness, implement it and test it on various examples 
designed to show its relevance to proving security properties, 
such as noninterference, in JavaScript. To our knowledge, this 
is the flrst fully static information flow analysis for a language 
with staged metaprogramming, and the flrst formal soundness 
proof of a CFA-based information flow analysis for a functional 
programming language. 

Keywords -noninterterence; staged metaprogramming; CFA; 
information flow; dynamically typed languages; JavaScript; 
static analysis 

L Introduction 

An information flow analysis determines which values in 
a program can influence which parts of the result of the 
program. Using an information flow analysis, we can, for 
instance, prove that program inputs that are deemed high 
security do not influence low security outputs; this important 
security property is known as noninterference [']• 

Early work on noninterference focused mainly on applica- 
tions in a military or government setting, where there might 
be strict rules about security clearance and classification of 
documents. More recently, there has been increased interest 
in information security (and hence its analysis) for Web 
applications, particularly for Web 2.0 applications written 
in JavaScript. 

We have developed a static information flow analysis 
for a dynamically typed, pure, functional language with 
stage-based metaprogramming [_]; we call the language 
SLamJS (Staged Lambda JS) because it exhibits a number 
of JavaScript's interesting features in an idealised, lambda 
calculus-based setting [j]. The analysis is based on the idea 
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of extending a constraint-based formulation of the analysis 
OCFA [ ! ] with constraints to track information flow. We 
believe that the idea could be extended to other CFA-style 
analyses (such as CFA2 [5]) for improved precision. We have 
formally proved the correctness of our analysis; we have also 
implemented it and tested it on a number of examples. 

Supporting material, which includes mechanisations of 
our key results in the theorem prover Coq and an imple- 
mentation of our analysis in OCaml, is available online at 
http : //mj olnir c s . ox . ac .uk/web/ slamj s/. 

The structure of the remainder of the paper is as fol- 
lows. In Section II, we present SLamJS: we begin with an 
explanation of why we beUeve our chosen combination of 
language features is relevant to information security in Web 
applications. Next, we present the semantics of SLamJS and 
explain, using an augmented semantics, what information 
flow means in this language. Section III explains how the 
analysis works and how we proved its correctness. We 
discuss our implementation and some examples on which 
we have tested the analysis in Section IV. In Section V, 
we examine the gap between our work and a practical 
analysis for real-world Web applications. We also discuss 
other research on analysis of information flow and staged 
metaprogramming, before concluding in Section VI. 

II. The Language SLamJS 

A. Motivation 

The new arena of Web applications presents many inter- 
esting challenges for information flow analysis. While there 
is an extensive body of research on information flow in stati- 
cally typed languages [6], there is little tackling dynamically 
typed languages. The semantics of JavaScript are complex 
and poorly understood [ ], which makes any formal analysis 
difficult. Web applications frequently comprise code from 
multiple sources (including libraries and adverts), written 
by multiple authors in an ad-hoc style. They are often 
interactive (so cannot be viewed as a single execution with 
inputs and outputs) and it might not be known in advance 
which code will be loaded. 



The eval construct of JavaScript, which allows execution 
of arbitrary code strings, is particularly troublesome, to 
the extent that many analyses just ignore it. However, a 
recent survey shows that real JavaScript code uses eval 
extensively [8]. Its uses vary widely from straightforward 
(loading data via JSON) through ill-informed (accessing 
fields of an object without using array notation) to sub- 
tle (changing scoping behaviour) and complex (emulating 
higher order functions). We think that it is important to 
develop techniques for analysing this notorious construct. 

So that we might reasonably work formally, we have 
developed a simplified language called SLamJS. The lan- 
guage is heavily influenced by Ajs, a "core calculus" for 
JavaScript P]. Like JavaScript, SLamJS is dynamically 
typed and features first-class functions and objects with 
prototype-based inheritance. Like JavaScript, it allows code 
to be constructed, passed around and executed at run- 
time. Unlike JavaScript, this is achieved using Lisp-style 
code quotations rather than code strings [9]. Recent work 
indicates that real-world usage of eval is often of a form 
that could be expressed using code quotations [id]. Thus 
analysis of programs with executable code quotations is an 
important step towards analysis of programs with executable 
code strings. 

B. Syntax and Semantics of SLamJS 

1 ) Syntax: SLamJS is a functional language with atomic 
constants, records, branching, first-class functions and staged 
metaprogramming; the syntax is given in Figure L 

The language has five types of atomic constant: booleans, 
strings, numbers and two special values (undef and null) 
to indicate undefined or null values. A record {s : v} is 
a finite mapping from fields (named by strings) to values. 
Fields can be read (e[e]), updated or replaced (e[e] = e) and 
deleted (del e[e]). Records support prototype-based lookup: 
a read from an undefined field of a record is redirected to 
the corresponding field on the record held in its "_proto_" 
field, if there is one. 

Branching on boolean values is enabled by the 
if(e){e} else{e} construct. Functions can be defined 
(fun(x){e}) and applied (e(e)). 

Staged metaprogramming is supported through use of the 
box, unbox and run constructs, box ei turns ei into a 
"quoted" or "boxed" code value, which can be executed 
using run. The use of unbox 62 within a boxed expression 
ei forces evaluation of 62 to a boxed value, which is spliced 
into ei before it becomes a boxed value. 

Expressions of the form (e, p) and run e in p only 
arise as intermediate terms during execution: the former 
represents an explicit substitution where all free variables 
of the expression e are given their value by the environment 
p; the latter represents an expression to be unboxed and 
evaluated in environment p. 



Values exist at all stages. Constants, records with constant 
fields and constant code quotations are values at every stage; 
closures are only values at stage zero. Other constructs may 
be values at higher stages, provided that their subexpressions 
are values at the appropriate stage. We generally omit the 
stage superscript for values of stage zero. 

2) Semantics: We give a small-step operational seman- 
tics with evaluation contexts and explicit substitutions for 
SLamJS. There are two reduction relations, and 
each annotated with a level n. The former is for top-level 
reduction, while the latter is for evaluation under a context. 

Evaluation contexts In a staged setting, evaluation con- 
texts may straddle stage boundaries, hence they are anno- 
tated with stage subscripts and superscripts. A context C™ 
denotes a hole at stage n inside an expression at stage m. For 
a context and an expression e, we denote by (e) the 
expression obtained by plugging e into the hole contained in 
C™. The grammar of some key evaluation contexts is given 
in Figure 2; full details are in Appendix A. 

Reduction rules Top-level reduction rules fall into two cat- 
egories: environment propagation rules for pushing explicit 
substitutions inwards (Figure 3), and proper reduction rules 
(Figure 4). The former are fairly straightforward, so full de- 
tails are left for Appendix A. Note that explicit substitutions 
only apply at stage zero, hence {x, p) evaluates to x at level 
n + 1 without looking up x in p. Furthermore, observe that 
(run e,p) pushes its environment into e, allowing boxed 
code values to capture variables from outside. 

The proper reduction rules are also quite standard [9], 
except for the field access rules, which are designed to mimic 
JavaScript semantics as far as possible. 

In particular, every record is expected to have a 
"_proto_" field, which holds either the value null or 
another record, giving rise to a chain of prototype objects 
that ultimately ends in null. Reading a record field follows 
this chain by rule (Read2), until the field is either found 

(ReadI), or the top of the chain is reached, where (Read3) 



yields undef. Note that the reduction - + can get stuck, for 
example, when applying a non-function, or branching on a 
non-boolean. 

There is only a single rule for ^: 

C;"(e} ^ C^{e') ife--.e' 

We write ^ for the union over all m of and ^* for 
its reflexive, transitive closure. 

Example 1: Here is an evaluation trace of a simple if 
statement. We use e to stand for the empty environment. 

(if (true) {false} else{l},e) 
if((true, e)){(false, e)} else{(l, e)} 
if(true){(false,e)}else{(l,e)} 
^ (false, e) ^ false 
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Figure 1. Syntax of SLamJS 
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Figure 2. Selected evaluation contexts 
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Figure 3. Selected environment propagation rules 



(lookup) (x, p) 

(Apply) ((fun(a;){e}, p)(w)) 
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Figure 4. Proper reduction rules 
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Figure 5. Semantic rules for lifts 



Example 2: The staging constructs in SLamJS allow frag- 
ments of code to be treated as values and spliced together 
or evaluated at run-time, as shown in this evaluation trace. 

(run (box (if(unbox (box (true))){false} else{l})), e) 
% run (box (if(unbox (box (true))){false} 
else{l}), e) ine 

-H>*run (box (if(unbox (box (true))){(false, e)} 
else{(l,e)}))in e 

run (box (lf(true){(false, e)} else{(l, e)})) in e 
-^*run (box (if(true){false} else{l})) ine 

(if(true){false}else{l},e) 

if(true, e){(false, e)} else{(l, e)} 
-H> if(true){(false, e)} else{(l, e)} (false, e) -'^ false 

Example 3: Our staging constructs allow variables to be 
captured by code values originating outside their scope. 
Here, the code value box y is outside the scope of y, but 
captures it during evaluation. 

(((fun(a;){(fun(y){run x})})(box y))(true),e) 
-^*(fun(y){run a;}, (a;i-^box y))(true) 
(run a;, (yi-!>true, x^box y)) 

run (a;, (j/H-true, a;M>box y)) in (y m> true, a; ^ box y) 

^ run (box ?/) in (j/i->true, xi~>box y) 

-2> (y, (yH'true,a;H'box y)) 
^ 

-> true 

This useful feature is vital for modelling certain uses of eval; 
the above code corresponds to this JavaScript: 
( (function (x) {return function (y) { 
return (eval (x) ) ; }}) ("y")) (true); 
However, the power comes at a price: the usual alpha 
equivalence property of A-calculus does not hold in SLamJS, 
which makes reasoning about programs harder 



C. Augmented Semantics of SLamJS 

The result of a program can depend on its component 
values in essentially two different ways. Consider pro- 
grams operating on two variables I and h. The program 
(if(/){/i} else{l}) may evaluate to the value of h (if / is 
true); we say that there is a direct flow from h to the 
result. Conversely, the program (if(/i){true} else{l}) cannot 
evaluate to h. However, the result of evaluation tells us 
whether h was true or false because h influences the control 
flow of the program; there is an indirect flow from h to the 
result of the program. 

In order to track the dependency of a result on its 
component subexpressions, we augment the language with 
explicit dependency markers [11], [12]. We also introduce 
new rules for lifting markers into their parent expressions 
to avoid losing information about dependencies. The aug- 
mented semantics is not intended for use in the execution 
of programs; rather, we use it for analysing and reasoning 
about dependencies in the original language. We begin by 
adding markers to the syntax: 

Markers m € Marker 
Expressions e ::— . . . | (m : e) 
Values ::— . . . | (m : u") 

We extend contexts to allow evaluation within a marked 
expression: 

::= ... I (m:C™) € C™ 

We allow propagation of environments within marked ex- 
pressions: 

(m:e,/3) -"^ (m:(e,p)) 

In Figure 5 we introduce lifts to maintain a record of 
indirect flows. Note there is no need for a lift rule on the 
right of an assignment (i.e., Vi[v2] = (tn : v^) (m : 
vi [V2] = f 3)), since the flow from W3 is direct. 



Example 4: Recall Example 1. Suppose we add markers 
to each of the components of the if. The evaluation trace 
now becomes: 

(if((H : true)){(L : false)} else{(i : l)},e) 

^* if((H : true)){((L : false), e)} else{((i : 1), e)} 

^ (h : (if(true){((L : false), e)} else{((i : 1), e)})) 

^ (h : ((l : false), e)) 

^* (h : (l : false)) 

Note how the markers h and l in the result indicate that it 
depends on the marked values (h : true) and (l : false). 

Example 5: Here is an example of marked evaluation 
with functions: 

(((fun(x){i : (fun(y){.T})})(H : 1))(l : 2), 6) : (h : 1)) 

Observe that the result depends on i because the function 
(i : (fun(j/){a;})})) was used to compute it, but not on l, as 
(l : 2) is discarded by that function. 

Simulation: Consider a function unmark, defined in the 
obvious way, which strips an expression of all markers. 
Clearly if unmark{ei) = /i A- /2, then for some 62 such 
that ei -^*e2, we have unmark{e2) = /2- 

III. Information Flow Analysis for SLamJS 
A. Overview 

Before we can define an information flow analysis, we 
need to define what information flow is. Following Pottier 
and Conchon [I I], we use the idea that if information does 
not flow from a marked expression into a value resulting 
from evaluation, then erasing that marked expression or 
replacing it with a dummy value should not affect the result 
of evaluation. (We use only their proof technique; their type- 
based analysis is not applicable to our language.) We begin 
in Section III-B by defining erasure and establishing some 
results about its behaviour. 

Our information flow analysis is built on top of a OCFA- 
style analysis capable of handling our staging constructs. 
Two variants of such an analysis are explained in Sec- 
tion III-C; mechanised correctness proofs in Coq are avail- 
able online. 

In Section III-D, we present the information flow analysis 
itself. A key idea in CFA is that control flow influences data 
flow and vice versa. Information flow depends on control 
and data flow, but the reverse is not true. Therefore it is 
possible to treat information flow analysis as an addition 
to CFA, rather than a completely new combined analysis. 
We have two versions of the CFA, each of which yields an 
information flow analysis. We sketch a correctness proof of 
the simpler analysis; complete mechanised proofs of both 
are available online. 

Finally, in Section III-E, we prove soundness of the 
information flow analysis. We also discuss its relationship 
with noninterference. 



B. Erasure and Stability 

1 ) Erasure and Prefixes: We extend the language with a 
"hole" that behaves like an unbound variable: 

Expressions e ::= • • ■ | _ 
Values w" ::= ... 



Now for M C Marker, define the M-erasure of e, written 
[ejM, to be: e with any subexpression (m : e') where m ^ 
M replaced by _. A full definition is in Appendix A. 

2) Prefixing and Monotonicity: We say that ei is a prefix 
of 62 or write ei =^ 62 if replacing some subexpressions of 
62 with _ gives 61. 

Evaluation is monotonic with respect to prefixing: if 61 ^ 
62 and 61 -^*f, where / contains no _, then 62 -^*/. 

n 

Lemma 1 (Step Stability): If 61 62, then either 

Leijj\/ L^zJa/ or the reduction rule applied to derive 
this is a lift (Lift-*) of a mai-ker m ^ M. 

Proof: By induction over the rules defining ■ 

Theorem 1 (Stability): Consider an expression 61 (which 
may use _) and a _-free expression 62 such that 61 ^*62. 
Then for every M C Marker such that [62] a/ = 62, it 

follows that [6ijAf — >-*[62Jm- 

Proof: Consider any 62 and M with [62 J a/ = Aim 
to prove, for any 61 with 61 ^*62, that [61J m —>*e2- Argue 
by induction over the length k of derivations of 61 — >*e2- 

Base case: k = 0. So ei ~ 62. We have [62JA/ = 62, so 
trivially [6iJm = £2- 

Inductive step: k = k' + 1. Given 61 -H> 6 -H^*^ 62, aim 
to prove [61 J M — >*e2. Assume by the induction hypothesis 
that [e\M ^^'e2. Let 61 = Cr(/i) and e = C™(/) with 
/i /■ Case split on if /i / is a lift of a marker 
m i M. 

If it is such a Uft, then let / = (m : /'). Now [/J a/ = _, 

so l/Ja/ =^ L/iJa/. Thus Lc^r(/)JA/ Lc;r(/i)jA/; that 

is, [e\M =^ L^iJa/- We already have (from the induction 
hypothesis) that [6Ja/ 62. Now, applying Monotonicity, 
we get [6ijAf ^*62. 

Otherwise, apply the Step Stability Lemma to get 
L/iJa/ --^ L/Ja/. It follows that LC;"(/i)jAf A 
[C™(/)Ja/; that is, [6iJa./ A [6Ja/. Using the induction 
hypothesis gives [6iJa/ — > le\M — 62, as required. ■ 

Example 6: Recall that in Example 5, the result depended 
on H and i, but not l. Applying [—J {h,i} and evaluating the 
initial expression gives: 

(((fun(x){i : (fun(y){x})})(H : !))(_), 6) V(i : (h : 1)) 

That is, the result of evaluation is unchanged. 



C. OCFA for SLamJS 

We use a context-insensitive, flow-insensitive control flow 
analysis (OCFA [ ']) to approximate statically the set of 
values to which individual expressions in a program may 
evaluate at runtime. 

As far as OCFA is concerned, the only non-standard 
feature of SLamJS are its staging constructs. Roughly speak- 
ing, box and unbox/run act like function abstraction and 
application, except that they use a dynamic (instead of static) 
scoping discipline.' 

We present two variants of OCFA for SLamJS: a simple, 
but somewhat imprecise formulation that does not distin- 
guish like-named variables bound by different abstractions, 
and a more complicated one that does. 

Simple Analysis: Following Nielson, Nielson and Han- 
kin [13], we formalise our analysis by means of an ac- 
ceptability judgement of the form T, g \= e, where F is 
an abstract cache associating sets of abstract values with 
labelled program points, and g is an abstract environment 
mapping local variables and record fields to sets of abstract 
values. Intuitively, the purpose of this judgement is to ensure 
that r(^) soundly over-approximates all possible values to 
which the expression at program point t can evaluate, and 
g does the same for variables and record fields. 

More precisely, we assume that all expressions in the 
program are labelled with labels drawn from a set Label. An 
abstract cache is a mapping Label — >■ V{AbsVal) associating 
a set of abstract values with every program point; similarly, 
an abstract environment g: AbsVar — > V^AbsVal) maps 
abstract variables to sets of abstract values, where an abstract 
variable is either a simple name x (representing a function 
parameter), or a field name of the form l.p, where ^ is a 
label representing a record, and p is the name of a field of 
that record. 

Our domain of abstract values is mostly standard, with, 
e.g., an abstract value NULL to represent the concrete null 
value, an abstract NUM value representing any number, and 
abstract values FUN(a;, e), BDX(e) and REC(^) representing, 
respectively, a function value, a quoted piece of code, and a 
record allocated at program point I. (A complete definition 
of all our abstract domains is given in Appendix A). For an 
abstract environment g and a label £ we define proto(i')g 
to be the smallest set P C Label such that £ ^ P and for 
every p e P and REC(^') e g{p." _proto_") also £' e P. 

The acceptability judgement is now defined using syntax- 
directed rules, some of which are shown in Figure 6 (the re- 
maining rules, which are standard, are given in Appendix A). 

We write to represent an expression of the syntactic 
form t, labelled with £. Thus, means an expression 
consisting of a literal k labelled £, and the first rule simply 

'This intuition is made more precise in Clioi et al.'s work on static 
analysis of staged programs ['>], where staging constructs are translated 
into function abstraction and application; we prefer to work directly on the 
staged language for simplicity. 



says that in order for F and g to constitute an acceptable 
analysis of fc^, r(^) must contain the abstract value [fc] 
representing k. Similarly, the second rule requires F and g to 
be consistent in the abstract values they assign to variables 
and references to them. The rules for dealing with function 
abstractions and records are standard and so are ehded here 
for brevity. 

The rule for boxe requires F and g to be an acceptable 
analysis of the single sub-expression e, and for T{£) to 
include an abstract value i' approximating boxe, which is 
written as T,g \= ly f^i boxe. This judgement holds if 

= BOX(e), but we must be slightly more flexible: during 
evaluation, unboxing may splice new code fragments into 
e, changing its syntactic shape to some new expression e'. 
In order for the flow analysis to be effectively computable, 
we want the set of abstract values to be finite, so we 
cannot expect every such BOX(e') to be part of our abstract 
domain. Instead, we close the approximation judgement 
under reduction, that is, if T , g \= 1/ Ki t and — > t'^ , then 
also r, g \= ly ^ t'; the full definition of the approximation 
judgement appears in Figure 7. 

The rule for unbox e is surprisingly simple: all that is 
required is that for any abstract value BOX(e') that the 
analysis thinks can flow into lbl{e) (i.e., the label of ex- 
pression e) every abstract value flowing into its body e' also 
flows into the unboxing expression. Note that this models 
the name capture associated with dynamic scoping, since 
our abstract environment g does not distinguish between 
different variables of the same name. The rule for run is 
the same as for unbox. 

Finally, we show the rule for if, which is standard: any 
abstract value that either of the branches can evaluate to is 
also a possible result of the entire if expression. 

To show this acceptability judgement makes sense, we 
prove its coherence with evaluation: 

Theorem 2 (CPA Coherence): If F, |= e and e — > e', 
then T,g^ e'. 

The proof of this theorem is fairly technical and is elided 
here. A full formalisation in Coq is available online in our 
supporting material. 

Owing to its syntax-directed nature, the definition of the 
acceptability relation can quite easily be recast as constraint 
rules; by generating and solving all constraints for a given 
program, an acceptable flow analysis can be derived. 

Note that, while there may be infinitely many abstract 
values of the form BOX(e) and FUN(e) that are relevant 
to a particular program, the closure of the approximation 
judgement under reduction means that the analysis need 
only consider those corresponding to subexpressions e of the 
original program, not those that may arise during execution. 
That is, the analysis need only solve a finite set of constraints 
over a finite set of abstract values and a finite set of labels 
and abstract variables, so it can be guaranteed to terminate. 

Example 7: Recall again Example 5. Our implementation 
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Figure 6. Some rules for the OCFA acceptability judgement 

r, g h= [fc] ~ for any literal k 
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Figure 7. The approximation judgement F, q \= u ^ t 



of the analysis labels the expression as follows: 

(((fun(:r){(i : (fun(y){x°})i)n)^(H : I'YYiL : 2'Yr 

By generating and solving constraints it gives the following 
solution for T: 

iH-jNUM} 1 ^{Fm{y,{xY)} 2 K^{FUN(y, (x)°)} 
3 ^->{FUN(x, ((i : (iun{y){{xY}YY)}4: k^{NUM} 
5 M.{NUM} 6 ^{Fm{y,{xY)} 7 K^{NUM} 
8 H-{NUM} 9 h^{NUM} 

while g = {x i-^ {NUM},y ^ {NUM}}. As expected, the 
result of evaluation (labelled 9) is a number 

Improved Analysis: The analysis presented so far is not 
very precise, since abstract environments do not distinguish 
identically named parameters of different functions. Ordi- 
narily, this is not a problem, as one can rename them apart, 
but this is not possible for SLamJS, which does not enjoy 
alpha conversion. 

To restore analysis precision in the absence of alpha 
conversion, we introduce an abstract context S that keeps 
track of name bindings. In a single-staged language, such 
an abstract context would simply map a name x to the 
innermost enclosing function abstraction whose parameter 
is X. In a multi-staged setting, we need to distinguish 
between bindings at different stages, hence the abstract 
context maintains one such mapping per stage. Thus S is 
a stack of frames, one for each stage; a frame maps each 
variable name to the label of its binding context. 

For instance, the two uses of x in the SLamJS expression 
fun(x){box(fun(a;){(unboxx)(a;)})} are at different stages, 
and hence bound by different abstractions: the first x by the 
outer abstraction, the second by the inner one. 

The acceptability judgement for the improved analysis is 
now of the form F, S ^ 6, and the derivation rules include 



additional bookkeeping to adjust S when analysing subex- 
pressions at different stages. While conceptually simple, this 
change somewhat complicates the formalism, so we do not 
present it in detail here; a full formalisation is available in 
the supporting material. 

D. Information Flow for SLamJS 

Assume we have already analysed a program using OCFA 
and found environments F, g that over-approximate the val- 
ues flowing to each labelled expression. We use information 
about which functions and boxed values may occur to assist 
in determining what direct and indirect flows occur between 
labels of the expression. 

By recursing over the structure of an expression, we 
generate constraints on a relation '^: 

: {Label l±) Name l±) Marker) {Label l±) Name l±) Marker) 

As an expression, the labels, variable names and markers 
occurring within an expression and the abstract values in the 
results of OCFA for an expression are all finite, the process 
will terminate. 

We express constraints between labels, variable names and 
markers as either direct flows (x y =^ x ^ y) ot 
indirect flows (x y => x y). (The distinction 
between direct and indirect is for clarity of exposition; there 
is no practical difference between them with regard to the 
resulting analysis.) 

Note that if instead we interpret x y and a; S-^ ?/ as 
(elements of) relations and define ^ = u S^, then 
satisfies the constraints. 

We say that F, p,'^|=if 6 if F, p |= 6 and the conditions 
in Figure 8 hold. As F, g and ^ are constant throughout the 
definition, we abbreviate F, ^|=if e to ^if e for clarity. 
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Figure 8. Rules for generating information flow constraints 



We now prove the coherence of our information flow 
analysis with evaluation. Like the corresponding proof for 
our OCFA, this is lengthy and technical, so we only sketch 
it here. A mechanisation of the proof is available online. 

Lemma 2 (Reduction Preserves Satisfaction): If we have 
r, £),'^^iF and also 4' ^"^ 4'' then T,Q,-^\=^if 4'- 
Furthermore, £2 ^* ^i- 

n 

Proof: By case analysis on the rules defining --->. ■ 

Theorem 3 (Information Flow Coherence): If we have 
r,f?,^|=iF ei and also ei — > 62, then F, £i,^^if £2- 
Furthermore, lbl{e2) lbl{ei). 

Proof: Sketch: Unfolding the definition of we let 
ei = 01^(4') and 62 = C™(4') with 4' -"^ 4'- 

Observe that r,£i,^^iF 4^ ^"'^ hence, applying 
Lemma 2, F, f?,^^iF 4^' '^^'^^ ^2 ^* ^i- Observe further 
that constraints generated by C™ and the contents of its 
hole interact only at that hole, labelled £2 or £1. Thus, using 
£2 ^* £1, they must be satisfied in the conclusion, giving 
F, g -^^iF C™(4') as required. 

The claim that lbl{e2) ^* lhl{ei) is trivial for all non- 
empty contexts, as lhl{e2) = lhl{ei). For the empty context, 
it follows directly from the similar claim in Lemma 2. ■ 

Example 8: Recall once more Example 5. Using the 
results of OCFA, our implementation generates the relations 
^ and as depicted in Figure 9. 

Setting -^^^ U S-^, we have h 9 and i 9 and 
L')^*9. As expected, this means the result (labelled 9) has 
information flows from h and i, but not l. 



H'^iT^34^6q^9 L-^ 
I 2 

Figure 9. Information flow constraints for Example 5 

E. Information Flow Soundness 

Theorem 4 (Information Flow Soundness): Suppose 

F, f?,^|=iF t^- Then if — >*v^ , where u is a stage-0 value 
composed only of markers and constants, then \v\m = v 
where M ~ {m £ Marker \ m ^* £}. 

Proof: First show that Fj^Ij-^I^if v^' with £' ^* £. 
Argue by a simple induction over the derivation of ^*v. 

Base case: F, ^?,^|=if follows immediately from the 
theorem's premise. 

Inductive step: Assume that F, £»,^^if ei and 
lbl{ei) ^* £, with ei ^ 62 the next step in the deriva- 
tion. Apply Theorem 3 to show that F,g,^|=iF 62 and 
lbl{e2) lhl{ei)\ hence lhl{e2) £■ 

Now we have F, g,'^\=iF v and £' ^* £. Observe from 
the definition of [wj m that if for every marker m that occurs 
in V we have m G M, then [wJm = v. 

But w is a value composed only of markers and constants, 
so for every marker m that occurs in v (by examination of 
the |=iF constraint rules) it must be the case that m ^* £' . 
Thus, as I' £, m ^* I. Hence, from the definition of 
M, m e M. So it is indeed true that \ v\m = v. ■ 



Relationship with Noninterference: Our information flow 
analysis can be used to verify the security property nonin- 
terference. Noninterference asserts that the values of any 
"high-security" inputs must not affect the values of any 
"low-security" outputs. In order for this assertion to be 
meaningful, we must have notions of input, output and high- 
and low-security levels. 

For example, assume elements of Marker represent dif- 
ferent levels of security, such as l for low security and h 
for high security. For input, assume two relations and 
5^^, which take an expression and set the values of low and 
high inputs respectively. For low-security output, just take 
the value to which an expression evaluates. 

Say that expression satisfies noninterference analysis if 
r, g,^\=iF and H^*i. Further, require that and 
satisfy the following conditions: 
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Claim: If t satisfies noninterference analysis, then in the 
following situation: 



low 



t" 
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where u is a value composed only of markers and con- 
stants, it follows that 4 ^*u^ . That is, the low output u is 
independent from the values of the high inputs for t selected 

high 

usmg — >. 

Proof: By the condition on observe we have 

r, g,'^\^iF t' . By the first condition on it then follows 
fliat r, £i,^|=iF t[ and F, £(,-^|=if t\. As h by 
soundness of information flow, we have u = [uJ{l}- So 
using stability, we get [tiJ{L} ^*u. But, by the second 
condition on we have L*'J{l} = UiJ{l} = L*2J{l}- So 
U2j{L} — >*u- Then by monotonicity, t2 ^*u. 

The conditions on and seem reasonable. As 

, low , high 

an example, — !• and — > that can only replace constants 
marked as l and h respectively and can only replace them 
with constants of the same type (integer, boolean or string) 
satisfy these conditions. 

IV. Evaluation 

We have implemented our analysis in OCaml and tested 
it on a range of examples. The source code for our analysis 
tool and the examples are available online. We now present 
some of these examples. 

For each example, we list the markers on which our 
analysis says the result may depend. To improve readability, 
we write let x = u in e as a shorthand for fun(a;){e}u. 
Our implementation extends SLamJS (and its analysis) as 
presented in this paper with primitive arithmetic, equality 



and typeof operators, which we use in some of our exam- 
ples. It can also handle mutable references in the style of 
Ajs and a subset of actual JavaScript syntax. Many of our 
examples are inspired by patterns of eval usage common 
in Web applications, as surveyed by Richards et al. [s] and 
discussed by Jensen et al. [ii i]. 

Example 9: Depends on: h, l. 
if(H : true){L : false} else{l} 

We begin with a classic example where branching on a value 
introduces an indirect flow from it. As our analysis does not 
track specific boolean values, it would give the same result 
if the branch were on (h ; false). We could resolve this 
imprecision by extending our abstract value domain with 
abstract values for true and false. 

Example 10: Depends on: h,i, l. Depends (improved): h, l. 
let ctrue — iun{x){iun{y){x}} in 
let cif =fun(2;){fun(y){fun(z){(a;(y))(z)}}}in 
{{cif{n : ctrue)){h : false))(i : 1)) 

Conversely, if we present the previous example using the 
standard Church-encodings of if and true as functions, our 
analysis is precise enough to determine that the result does 
not depend on i. Note that we need the improved analysis 
to distinguish the bindings of x and y in ctrue and cif. 

Example 11: Depends on: l. 

let X = if(true){box /}else{box g}\n 

let / = fun(y){l}in 

let g = fun(z){L : true} in 

run (box ((unbox .t)(h : undef))) 

This example is modelled on the following JavaScript usage 
pattern [id]: 
if (...) X 



'f ' 



else 



X = "g"; eval (x + "()"); 

/ and g are bound to functions; x is set to a code value 
of either / or g; a function argument is added to the code 
value and the result executed. In this example, both / and 
g ignore their argument (h : undef), so the result does not 
depend on h; our analysis correctly identifies this. 

Example 12: Depends on: h, l. Depends (improved): l. 

let c = box X in 

let a; = L : 1 in 

let eval = fun(6){run b} in 

let a; = H : 2 in 

eval(c) 

JavaScript programmers sometimes use eval to execute code 
within a different scope. SLamJS does not aim to emulate 
all the quirks of eval, but scoping of staged code can still 
have interesting behaviour, as shown in this example. In the 
scope of the definition of the function bound to eval, a: is 1. 
So when it evaluates the code value c, which contains just 
the variable x, this is the value it returns; note that x was 
not bound at all where c was defined. The second binding 
of x is unused; our analysis correctly determines this. 



Example 13: Depends on: h, i, l. 
let z = I : 

{"_proto_" : null, "a;" : (h : 1), "y" : (l : 2)} in 

let s = fun(jc?){let / = box (i[unbox id]) in run /} in 

s(box "y") 

Some JavaScript programmers use eval to construct variable 
names, as in (var n = 5; eval ("f_" + n) ;) to 
access f_5. We cannot express this directly in SLamJS 
because there are no facilities to manipulate variable names. 
Another common practice is to use eval to access object 
properties, often because of the programmer's ignorance of 
JavaScript's indirect object field access syntax; this example 
models that practice in SLamJS. Because our analysis does 
not model the values strings may take, its handling of field 
reads and writes is rather coarse, so it cannot tell the result 
will not depend on h; this could be addressed refining our 
abstract value domain. 

Example 14: Depends on: h. 

let fst = fun(a;){fun(y){a;}} in 

let / = if(false){/st}else{box/si}in 

let a; = (h : 1) in 

let y = (l : true) in 

if(typeof / = "function"){(/(a;))(2/)} 
else{run (box (((unbox f){x)){y)))} 

This example models the JavaScript usage pattern: 

if (f instanceof Function) f(x); 
else eval (f + " (x) ") ; 

which may arise when using eval to emulate higher-order 
functions. Here, our analysis shows the same precision on a 
boxed value representing a function as when dealing with a 
real function. 

Example 15: Depends on: h,l. Depends (improved): l. 
let pair = fun(a;){fun(y){fun(z){run 2}}}in 
let fst = fun(z){z(box x)} in 
let snd = fun(z){z(box y)} in 

let bp = box {{pair{h : (box (1))))(h : (box (true)))) in 
let boxfst = box {{fst){unbox bp)) in 
run (run (boxfst)) 

Most examples of staged metaprogramming in the literature 
do not use more than one level of staging. This example, 
which pairs and unpairs two values in a rather roundabout 
way, illustrates that we can handle higher levels too. 

Example 16: Depends on: h. 
i\in{n){{iun{x){{x{x)){n)}) 

(fun(x){fun(y){if(y = 0){true} else{(a;(a:))(y - 1)}}}) 
}(h:5) 

This program loops n times (where n is (h : 5) in this 
instance) before returning true. In this sense, the result 
is independent of n: if n were a high-security input and 
the output low, the program would satisfy noninterference, 
although the duration of execution may leak information 
about n. However, n must be examined in order to execute 



the program, so there is an information flow from n to 
the result, in the sense captured by our labelled semantics. 
That is, no noninterference analysis based on a sound over- 
approximation of the behaviour of such a labelled semantics 
could ever show the program to be noninterfering. 

Example 17: Depends on: l. 
let fst = fun(x){fun(y){2;}} in 
let a = box X in 

let b = box (fun(x){fun(?/){/si(unboxa)(i/)}}) in 
(run6)(L : 1)(h : 2) 

This program, based on an example from Choi et al. [Q], 
splices a variable name into a code template to produce code 
that takes two arguments and returns the first. Our analysis 
correctly determines that the result depends only on the first. 

Example 18: Depends on: l, h. 

let fst = fun(x){fun(?/){x}} in 

let a = fun(p){p["x"]} in 

let b = (fun(/i){fun(p){fun(a;){fun(y){ 

/si(/i((p["x"] = x)["Y"] = y)){y)}}}}){a) in 

6({" proto " : null})(L : 1)(h : 2) 

By applying Choi et al.'s unstaging translation to the core of 
the previous example, we obtain this unstaged one. Note that 
while the result of the program is the same, we lose precision 
by analysing this version instead of working directly on the 
staged version. 

Example 19: Depends on: l,h. Depends (improved): l. 

let blank = iun{get){get{nu\\){mi\\)} in 

let getx = fun(x){fun(y){a;}} in 

let gety = iun{x){iun{y){y}} in 

let setx = iun{env){iun{newx){ 

iun(^get){get{newx){env{gety))}}} in 

let sety ~ fun(eri-(;){fun(ne?i'y){ 

\\yc\(get){^get(env(getx)){newy)y\ \ 

let fst = fun(a;){fun(?/){x}} in 

let a = fun(p){p(gete)} in 

let b = (fun(/i){fun(p){fun(a;){fun(y){ 

fst(h{sety{setx{p){x)){y))){y)-]}}}){a) in 
b{blank){h : 1)(h : 2) 

Here we have applied the unstaging translation, as in the 
previous example, but using higher order functions to encode 
environments instead of records. In this case, we can recover 
the lost precision, but at the cost of an 0{n^) increase in the 
size of the source program, making the combined analysis 
0{n^) instead of 0{n^) [14]. 

V. Related Work 

A. From SLamJS to JavaScript Applications 

The application that guided our work is information 
flow analysis for JavaScript in Web applications. We now 
consider some of the features of this scenario that we have 
not addressed and how they have been handled by others. 
We claim that most of the problems have been addressed. 



although combining them into a single analysis system 
would require further effort. 

Handling of Primitive Datatypes As demonstrated in 
some of our examples, our analysis models its primitive 
datatypes (such as strings and booleans) very coarsely; our 
abstract domains are too simple. Fortunately, more refined 
abstractions for these datatypes have been well-studied [15]. 

Imperative Control Flow and Exceptions JavaScript has 
several features not found in SLamJS, including typical 
imperative control flow features (such as for loops) and 
exceptions, but there are CFA-style analyses for JavaScript. 
Perhaps most notable is the recent CFA2 analysis [: ], which 
was developed for JavaScript and features significantly better 
analysis of higher order flow control. 

JavaScript Semantics A bigger problem in producing a 
sound analysis of JavaScript is the complexity and quaint- 
ness of its semantics [7]. Guha et al. attempt to simplify this 
problem by producing a much simpler "core calculus" for 
JavaScript called Ajs and a transformation from JavaScript 
into Ajs [3]. They have mechanised various proofs about 
their language in Coq. As Web applications execute in the 
context of a webpage in a browser, an analysis must also 
model how a webpage interacts with code via the DOM. 

Code Strings vs Staged Code Perhaps the most relevant 
difference between JavaScript and SLamJS is our metapro- 
gramming constructs: JavaScript eval runs on strings, while, 
in an effort to develop a more principled analysis, our staged 
metaprogramming follows the tradition of Lisp quotations. 
To analyse uses of eval with our techniques, we would 
need a sound transformation into staged metaprogramming. 
Jensen et al. use the result of a string analysis produced by 
the tool TAJS to replace certain uses of eval with unstaged 
code where it is safe to do so [ 1 0]; the transformed program 
is then fed back into the analysis tool. We propose to handle 
a wider range of use cases with the more general approach 
of transforming eval on strings into staged code and then 
analysing the staged code. 

Reactive Systems A practical Web application is not sim- 
ply a program that take inputs, runs once, then gives output: 
it may interleave input and output throughout its execution, 
which might not terminate. Bohannon et al. consider the 
consequences of this for information security in their work 
on reactive noninterference | |. 

Infrastructural Issues In applying an information flow 
analysis to a Web application, several infrastructural issues 
need to be addressed. Would the code be analysed before 
being published by on a webserver, in the browser running it 
or by some proxy in between? Will the entire code be avail- 
able in advance, or must it be analysed in fragments [17]? 
Who would set the security policies that the analysis should 
enforce? Li and Zdancewic argue that noninterference alone 
is too strict a policy to enforce and that a practical policy 
must allow for limited declassification [ItS]. 



B. Information Flow Analysis 

Early work on information flow security focused on mon- 
itoring program execution, dynamically marking variables 
to indicate their level of confidentiality [19]. However, the 
study of static analysis for information flow security can es- 
sentially be traced back to Denning, who introduced a lattice 
model for secure information flow and critically considered 
both direct and indirect flows [ : ]. Denning and Denning 
developed a simple static information flow analysis that 
rejected programs with flows violating a security policy [21], 

Noninterference Goguen and Meseguer introduced the 
idea of noninterference [ ! ] (the inability of the actions of one 
party, or equivalently data at one level, to influence those of 
another) as a way of specifying security policies, including 
enforcement of information flow security. Noninterference 
and information flow security became almost synonymous, 
although Pottier and Conchon were careful to emphasise the 
distinction between the two [11]. 

Security Type Systems Security type systems became a 
common way of enforcing noninterference policies and 
proving the correctness of noninterference analyses, pro- 
gressing from a reformulation of Denning and Denning's 
analysis [22] to Simonet and Pottier's type system for 
ML [( ]. Unfortunately, the requirement that the program 
analysed follow a strict type discipline makes it impractical 
to apply these ideas to dynamically typed languages such 
as JavaScript. Perhaps as a consequence, information flow 
in untyped and dynamically typed languages is relatively 
poorly understood. 

Dynamic Analyses Dynamic information flow analysis cir- 
cumvents the need for a type system or other static analysis 
by tracking information flow during program execution, and 
enforcing security policies by aborting program execution 
if an undesired flow is detected; examples of such analyses 
for JavaScript are presented by Just et al. [23] and Hedin 
and Sabelfeld [24]. Indeed, the problems they address and 
their motivations are very similar to ours, but our methods 
are very different. 

Dynamic vs Static A dynamic analysis only observes 
one program run at a time, so dynamic code generation 
is easy to handle. However, care has to be taken to track 
indirect information flow due to code that was not executed 
in the observed run. Strategies to achieve this include, for 
instance, the no-sensitive upgrade check [2?], which aborts 
execution if a public variable is assigned in code that is 
control dependent on private data. As a rule, however, such 
strategies are fairly coarse and could potentially abort many 
innocuous executions; thus it is commonly held that static 
analyses are superior to dynamic ones in their treatment of 
indirect flows [26], although there has been a resurgence of 
interest in dynamic analyses [27]. 

Hybrid Approaches As a compromise, Chugh et al. [ 1 7] 
propose extending a static information flow analysis with a 



dynamic component that performs additional checks at run- 
time when dynamically generated code becomes available. 
The static part of their analysis is similar to ours (minus 
staging), although they do not formally state or prove its 
soundness. Their study of JavaScript on popular websites 
suggests the static part is precise enough to be useful. 
Because the additional checks on dynamically generated 
code occur at runtime, they must necessarily be quick and 
simple to avoid performance degradation. Consequently, 
these checks are limited to purely syntactic isolation proper- 
ties, with a corresponding loss of precision. Our fully static 
analysis does not suffer from these limitations. 

Going in the other direction, Austin and Flanagan [28] 
have proposed faceted execution, a form of dynamic analysis 
that explores different execution paths and can thus recover 
some of the advantages of a static analysis. 

C. Static Analysis of Staged Metaprog ramming 

Many different approaches to staged metaprogramming 
have been proposed. Our language's staging constructs are 
modelled after the language A5 of Choi et al. [')]. How- 
ever, our semantics of variable capture are different. For 
example, we allow the program (fun(a;){run (box a;)}(l)), 
which behaves much like this JavaScript program: 
(function (x) {return eval("x")}) (1); 

Control flow analysis for a two-staged language has been 
investigated by Kim et al. [29]. Their approach is based on 
abstract interpretation, putting particular emphasis on infer- 
ring an over-approximation of all possible pieces of code to 
which a code quotation may evaluate. This information is not 
explicitly computed by our analysis, so it is quite possible 
that their analysis is more precise than ours. However it does 
not seem to have been implemented yet. 

Choi et al. [9] propose a more general framework for 
static analysis of multi-staged programs, which is based 
on an unstaging translation that replaces staging constructs 
with function abstractions and applications. Under certain 
conditions, analysis results for the unstaged program can 
then be translated back to its staged version. 

There are some limitations to their work. Most signif- 
icantly, many interesting programs, such as the one men- 
tioned earlier, are not valid in A5 and hence cannot be 
unstaged using their translation; this limits its applicability 
to JavaScript. Furthermore, as shown in Examples 17-19, 
the precision of the resulting combined analysis is highly 
sensitive to the target language encoding used in the transla- 
tion and the behaviour of the target language analysis. While 
their approach is useful as a quick way of adding staging 
to an existing language and analysis, we argue that staging 
constructs are sufficiently important and complex that we 
should aim to analyse them directly. 

Inoue and Taha [30] consider the problem of reasoning 
about staged programs; in particular, they identify equiv- 
alences that fail to hold in the presence of staging, and 



develop a notion of bisimulation that can be used to prove 
extensionality of function abstractions, and work around 
some of the failing equivalences. Their language differs from 
ours in that it avoids name capture. 

VI. Conclusions 

We have presented a fully static information flow anal- 
ysis based on OCFA for a dynamically typed language 
with staged metaprogramming, implemented it and for- 
mally proved its soundness. We believe our approach is 
transferrable to other CFA-style analyses and applicable to 
JavaScript. 

Progressing from here, there are three obvious lines of 
work. The first is to improve the precision of the analysis 
by applying its ideas to CFA2 or using results from abstract 
interpretation. The second is to extend the language to 
handle more features, such as imperative control flow and 
exceptions. The third and most important is to apply string 
analysis techniques to produce a sound transformation from 
a language with eval on code strings to a language with 
staged code values. 

All the pieces are now in place for an interesting, sound 
and principled analysis of JavaScript with eval, but it will 
take significant effort to bring them together. 
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Appendix 
SLamJS Semantics Definitions 

Some of the less interesting definitions of SLamJS seman- 
tics are given here in full. 



({s:i;™,s:C™,sTe}) 
(fun(a;){C™+i}) 

(Cr(e)) 

(box 

(unbox C™) 

(run C™) 

(if(C-){e}else{e}) 
(if(u"+i){C^+i}else{e}) 
(if(z;™+i){t;"+i} else{C™+i}) 

(c;rN) 

(t;'"[C™]) 

{C^[e] = e) 
(z;-[C™] = e) 

(z;™[w™] = C™) 

(del C™[e]) 
(del u'"[C™]) 
(run In p) 

Figure 10. Evaluation contexts 
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G 

e 
e 
e 
e 
e 
e 
e 

€ 

e 
e 

€ 

e 
e 
e 
e 
e 
e 



nm 
pm 

pm 

nra 

Qm+1 

Qm+1 

pm 

pm 

pm 

pm 

pm 

pm 

pm 

Cm 
n 



{k,p) 
{{sTe},p) 

{x,p) 
(fun(x){e},p) 
(ei(e2),/?) 
(box e, p) 
(unbox e, p) 
(run e, p) 
(run e, p) 
(If(ei){e2}else{e3},p) 
(ei[e2],p) 
(ei[e2] = e3,p) 
(del ei[e2],p) 



n+l 
n+l 



n 
n 

--■> 


--■> 

n+l 



k 

{s : {e,p)} 

X 

(fun(x){(e,p)}) 
{{e,,p){(e,,p))) 
(box(e,p)) 
(unbox (e,p)) 
(run (e,p) In p) 
(run (e,p)) 

(if((ei,p)){(e2,p)}else{(e3, 

((ei,p)[(e2,p)]) 
((ei,p)[(e2,p)] = (e3,p)) 
(del (ei,p)[(e2,p)]) 
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M 








M 




k 




M 




{s ■■ [cJm} 


b_ 


M 




X 


[fun(a;){e}_ 


M 




fun(x){[ejM} 


Lei(e2). 


M 




[eijM([e2jM) 


[box e 


M 




box [ejM 


[unbox e 


M 




unbox [ejjvf 


[run e 


M 




run \ e\M 


[If(ei){e2}else{e3}_ 


M 




if(LeiJAf){[e2jM}else{ [63] m} 


[eiN, 


M 




[eijAf[['52jAf] 


[ei [62] = e3_ 


M 




[eijM[[e2jM] = [csJm 


[del ei[e2] 


M 




del [eijM[[e2jM] 


L(e,p). 


M 




([ejM, [pJm) 


[run e in /? 


M 




run [ejMin [pJm 


LpJm( 


x) 




Vp{x)\m 


[m : e_ 


M 




tn : [ej M if m S M 


[m : e_ 


M 




if tn ^ M 



Figure 12. Definition of [cJm. the M-erasiae of e 



P)}) 



Figure 1 1 . Environment propagation rules 



Appendix 
OCFA FOR SLamJS 

A. Labelled Semantics 

We extend the syntax of SLamJS with labels to indicate 
program points. The labels have no effect on the result of 
computation, but are used to track which values may occur at 
which points. Consequently, it is important for the soundness 
of the corresponding analysis that the semantics correctly 
tracks labels. 

We reformulate the syntax of SLamJS to distinguish 
between terms (expressions in the unlabelled semantics) and 
expressions (which are labelled terms): 

Expressions e ::— 

Terms t ::— k \ {sTe} \ x \ fun(a;){e} | e(e) 

I box e I unbox e | run e 

I if(e){e}else{e} | e[e] | e[e] = e 

I del e[e] | {t,p) \ run e in p 

Values remain expressions, so they include labels at the 
outer level. For example, is a value, rather than k. 
Contexts other than the empty context also gain labels at 
the outer level, so we have (C™(e))^ rather than (C™(e)). 

The labelling of the reduction rules is a little more 
complicated, so we list them in full in Figure 13, 14 and 15. 
For an expression e = t^, we write e/ as a shorthand for 
and (e, pY for {t, pY . Note that we use this in the rules 
(Lookup), (Unbox), (Run) and (ReadI). 

B. Analysis 

The abstract domains of the analysis are defined in Fig- 
ure 16. Abstract variables of the form x represent function 
parameters; abstract variables of the form £.p represent 
record fields. Note that e, £, x and p only range over 
expressions, labels and names occurring in the program to 
be analysed, hence the abstract domains are finite. 

For a literal fc, let [fc] be its abstract value, that is: 

[null] = NULL 

[undef] = UNDEF 

[&] = BOOL for boolean b 

\n\ = NUM for number n 

\s] = STR for string s 

For an abstract environment q and a label I we define 
proto(£)g to be the smallest set P C Label such that £ ^ P 
and for any p E P and REC(£') G g{p. " —proto_") also 

£' e P. 

We define three acceptability judgements T, g \= e; 
T, g \= p and T, g \= i' Ki t hy mutual induction as shown 
in Figure 17. 



{x.pf 
(fun(x){t^},p)^' 

(box i^,pY' 
(unbox t^,pY' 

(run t^.pY' 
(run t^,pY' 
(if(tl0{4^}else{4^},p)^« 

{t{^[ti^]^ti^,pY^ 

(del t{^[ti%pY^ 
{m:t{\pY 



n+l 
n+l 

n 

n 

n 



---> 

n+l 



n 
---> 

n 

n 



{^:(i,p)T 

(fun(x){(t,p)n)^' 
((ii,p)^i((i2,p)^^))^ 
(box(i,p)^)^' 
(unbox (t,p)^)^' 
(run {t, pY in pY' 
(run (t,p)^)^' 

(if((ii,p)^0{(i2,pr^}else{(i3,p)^4)^ 

{{tupY'[{t2,pY']Y'' 

{{t,,pY^[{t,,pY^]^{h,pY'Y'' 

(del (ti,pr^[(t2,p)^1)^° 
(m:(ti,p)^0' 



Figure 13. Labelled environment propagation rules 



Lookup) 


{x,pY 





V 




Apply) 


{{iun{x){t'^},pY^{v)Y^^ 





{t,p[x ^^ v]Y' 


Unbox) 


(unbox (box v^Y^Y^ 


1 

---> 






(run) 


(run (box v'^Y^ in pY'^ 





{v\pY^ 




'ifTrue) 


(if(true){tf }else{42})^ 




- 


'-1 




'ifFalse) 


(if(false){tf }else{4^})^ 









Readi) 


{{s:v,sr.v^,s:v'}Hs'^]Y- 









Read2) 


{{s : V, "_proto_" : {s : v'Y'^,s: v"Y^[si^]Y' 





{{s : w'}^ 


'i[4=])^3 


Read3) 


{{s : V, "_proto_" : nul\^\s : v"Y'[si^]Y'' 





undef^^* 




Write l) 


{{s:v,Si:Vi,s:v'Y'[si^]=vlY' 





{s : v,Si 


■Vi,s: v'} 


'write2) 


{{s : vY'ist] = v.Y' 





{s : v,s.j; 


■■ v^Y' 


(deli) 


(del {s:v,s^:,H,s:v'Yn4']Y' 



- -» 


{s : V, s : 


v'Y' 


Del2) 


(del {s:vY'[si']Y' 





{s : vY' 





where p{x) = v 



if Sx ^ s L) s" 
if Sx s U s" 

if Sx s 

if fix ^ s 



Figure 14. Labelled proper reduction rules 



(lift-App) 

(lift-If) 

(lift-Unbox) 

(lift-RunIn) 

(lift-ReadSel) 

(lift-ReadRec) 

(lift-WriteSel) 

(lift-WriteRec) 

(lift-DelSel) 

(lift-DelRec) 



{{{m:t'-),pY^{v)Y' 
(if((m:t;)^o){i^}else{4^})^ 
(unbox (m : vY'Y^ 
(run (m : vY^ in pY'^ 
{v,[{m : v,Y']Y' 
{{m : v,Ynv2]Y' 
{vi[im : V2Y'] = vsY' 
iim:viY'[v2]=vsY' 
(del vi[{m : V2Y']Y^ 
(del (m : viY^ [v2]Y^ 







(m 
(m 



{m:{{t,pY'{v)Y'Y' 
{m:(\f{v){ti^}B\se{ti^}YY 
(unbox vY^Y^ 
(run win pY^Y^ 

{m : {v,[v2]Y-'Y' 
{m:{v,[v2]Y-'Y^ 

(m : (vi[v2] ^ v^Y'Y' 
{m:ivi[v2]=vsY'Y' 
(m : (del vi[v2]Y^Y^ 
(m : (del vi[v2]Y^Y^ 



Figure 15. Labelled lifts 



Abstract values v e AbsVal 

Abstract variables ^ G AbsVar 

Abstract caches T 

Abstract environments g 



::= NULL | UNDEF [ BOOL | NUM | STR 

I FUN(a:, e) | BOX(e) | KEC{£) 

::= X I £.p 

: Label V (AbsVal) 

: AbsVar V (AbsVal) 



Figure 16. Abstract domains 



T,g^(1un(x){e})' 

r,g^(ti^(4n)' 

r, g h (box e)' 

r,g\= (unbox 

r, ^ ^ (run t^)^° 

T,g\= (run in p)^" 

r,g^(\i(ti^){ti^}e\SB{4^})' 

r,eh(i,p)' 
r,^H(4l4=])' 

T,g^(dB\ 4^[4^]Y 

T,g^(xa:4^Y 

r, e N W « 

T,g\= FUN(x, e) w fun(a;){e} 

r, e 1= BOX(e) » box e 

r, e h REc(f ) w {iTT^} 
r, e H ~ i' 



if 
if 
if 

and 
if 

and 
if 

and 
if 

and 
if 

and 
if 

and 
if 

and 
if 

and 

if 

if 

and 
and 
if 

and 
and 
if 

and 
if 

and 
if 

for any literal k 



if 
if 
if 



[fc] e r(^) 
g(x) c r(£) 
Vi.r, Ci 

3REC(r) e T(e).\Ji.T(lhl(ei)) C g(e .Si) 
T,g^e 

3v G r(£).r, £- h « fun(a;){e} 

T,q[.4^ hT,Q^4- 

\jYm(x,4') e r(£i).r(4) c ^(a;) a r(4) c t(C) 

T,g\=e 

3v &T(l).V,g\=vKiYiOX e 

VBox(t'^') er(£).r(£') cr(^o) 
VBOx(f^') € r(£).r(f ) c r(4) 

VBox(i'^') € r(^).r(f ) c r(£o) 

r,eh*i' A^,^^h^2' Ar,eh4' 
r(^2) cr(4)Ar(£3) cr(4) 
r, ^; h A r, ^; h p 

VREC(f ) e r(^i).Vs,f' e proto(f )e.p(r.s) c r(£) 
UNDEF e r(^) 

r,ehi^ Ar,eh4' Ar,eh4' 
Vs,REC(f) e r(^i).r(£3) c .s) 
r(4) c T(i) 

T,Q^4- AT,g^4- 
T(h) C T(i) 

T,g^4' 

T(h) c r(£) 

Va; G rfom(p).r, ^ |= p(x) A r(/6Z(p(a;))) C ^(a;) 



'ii3vi G £»(^'.Sj).r, g ^ i/j t j 
V,g\=y^thT,g^ p 



Figure 17. Acceptability judgements 



